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Abstract 

Introduction: Transforming growth factor-(3s (TGF-(3s) play a dual role in breast cancer, with context-dependent 
tumor-suppressive or pro-oncogenic effects. TGF-(3 antagonists are showing promise in early-phase clinical oncology 
trials to neutralize the pro-oncogenic effects. However, there is currently no way to determine whether the 
tumor-suppressive effects of TGF-(3 are still active in human breast tumors at the time of surgery and treatment, a 
situation that could lead to adverse therapeutic responses. 

Methods: Using a breast cancer progression model that exemplifies the dual role of TGF-(3, promoter-wide chromatin 
immunoprecipitation and transcriptomic approaches were applied to identify a core set of TGF-(3-regulated genes that 
specifically reflect only the tumor-suppressor arm of the pathway. The clinical significance of this signature and the 
underlying biology were investigated using bioinformatic analyses in clinical breast cancer datasets, and knockdown 
validation approaches in tumor xenografts. 

Results: TGF-(3-driven tumor suppression was highly dependent on Smad3, and Smad3 target genes that were 
specifically enriched for involvement in tumor suppression were identified. Patterns of Smad3 binding reflected 
the preexisting active chromatin landscape, and target genes were frequently regulated in opposite directions in vitro 
and in vivo, highlighting the strong contextuality of TGF-(3 action. An in wVo-weighted TGF-(3/Smad3 tumor-suppressor 
signature was associated with good outcome in estrogen receptor-positive breast cancer cohorts. TGF-(3/Smad3 effects 
on cell proliferation, differentiation and ephrin signaling contributed to the observed tumor suppression. 

Conclusions: Tumor-suppressive effects of TGF-(3 persist in some breast cancer patients at the time of surgery and 
affect clinical outcome. Carefully tailored in vitro/in vivo genomic approaches can identify such patients for exclusion 
from treatment with TGF-(3 antagonists. 
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Introduction 

Gene expression profiling approaches have highlighted the 
molecular heterogeneity of breast cancer [1], and identi- 
fied gene expression fingerprints of molecular pathway 
activation [2]. A greater understanding of the contribution 
of different signaling pathways will be critical for the 
development of precision medicine approaches to cancer 
therapy. Transforming growth factor- ps (TGF-ps) are 
highly pleiotropic regulatory proteins that play complex 
roles in epithelial carcinogenesis, and the prevailing 
dogma is that they switch from a predominantly tumor- 
suppressive role to a tumor-promoting role as disease 
progresses (reviewed in [3-5]). Based on encouraging 
preclinical data showing that the deleterious pro- 
oncogenic arm of the TGF-p biological response program 
can be effectively blockaded for therapeutic benefit, 
TGF-p antagonists are now in early phase clinical trials 
in oncology in several tumor types [6], including breast 
cancer (http://clinicaltrials.gov Trial NCT01401062). How- 
ever, the specter remains that such interventions could 
inadvertently interfere with residual tumor suppressive 
activity and thus adversely affect outcome. Here we have 
asked if genomic approaches can be used to discern 
whether tumor-suppressive effects of TGF-p do indeed 
persist and influence survival in any human breast cancers 
at the time of surgery. 

Mechanisms underlying the dual role model for TGF- 
p in cancer progression involve a wide variety of TGF-p 
effects on both the tumor parenchyma and the support- 
ing stromal microenvironment. Tumor-suppressive effects 
include the induction of various protective responses to 
counteract genetic damage and oncogene activation [7-10], 
as well as the maintenance of a tumor-suppressive cytokine 
and chemokine profile in the microenvironment [11-13]. 
However as disease progresses, activation of oncogenic 
pathways in the tumor parenchyma can not only override 
the tumor-suppressive responses to TGF-p, but can also 
unmask pro-progression responses such as induction of 
the epithelial-to-mesenchymal transition, enhanced 
migration and invasion, and expansion of the cancer 
stem cell compartment [14-18]. At the same time, the 
excessive TGF-p that is frequently found in the micro- 
environment of advanced tumors can subvert antitumor 
immune surveillance, promote angiogenesis, and gener- 
ally contribute to the development of a more supportive 
tumor stroma [19-21]. 

Preclinical studies in model systems have provided 
considerable support for a dual role for TGF-p in breast 
cancer (reviewed in [22,23]). TGF-p was shown to switch 
from tumor-suppressor to prometastatic factor with dis- 
ease progression in both a HER2/Neu-driven genetically 
engineered mouse model and a Ras-driven human xeno- 
graft model of breast cancer [24,25]. In contrast, studies 
in the MMTV-PyVT mouse model of breast cancer have 



suggested that tumor-suppressive effects of the TGF-p 
pathway may persist even in late-stage metastatic disease 
[26-28]. Currently, the relative importance of the two 
different aspects of TGF-p biology in determining 
clinical outcome in human breast cancer patients is not 
clear. TGF-p pathway components are rarely mutated or 
deleted in breast cancer [29], so the effects of any TGF- 
p pathway perturbation in the clinical situation are likely 
to be more subtle than is seen with preclinical knockout 
models. Interestingly, the majority of human breast can- 
cer cell lines have lost their growth inhibitory responses 
to TGF-p in vitro [30], and only MCFlOCalh cells have 
been definitively shown to retain tumor-suppressive 
responses to endogenous TGF-P in vivo [25]. This situ- 
ation could reflect an early loss of TGF-p-driven tumor 
suppression in the majority of human breast cancers. In- 
deed, reduced expression of the type II TGF-p receptor 
has been seen in epithelial hyperplasia without atypia, 
the very earliest preneoplastic lesion of the breast [31]. 
Alternatively, it may simply reflect challenges in estab- 
lishing cell lines from breast cancers that retain such 
responses. 

The possible persistence of tumor-suppressive effects 
of TGF-p in human breast cancers at the time of clinical 
intervention has profoundly important implications for 
the deployment of TGF-p-targeted therapies. To address 
this question rigorously, we chose to develop a gene 
signature for TGF-p-driven tumor suppression. Several 
TGF-p-related gene expression signatures have been de- 
veloped previously [2,11,18,32-36], but they were not de- 
signed a priori to discriminate between tumor-suppressive 
and pro-progression responses to TGF-p. Furthermore, 
such signatures are almost invariably associated with poor 
prognosis, suggesting that the pro-oncogenic activities of 
the TGF-p pathway are more readily captured by these ap- 
proaches than are the tumor-suppressive activities. Here 
we describe an integrated in vitro/in vivo genomic strategy 
to identify a gene signature that specifically reflects TGF- 
p-driven tumor-suppressive effects on the tumor paren- 
chyma. We show that high expression of the signature 
predicts good outcome in clinical datasets from estrogen 
receptor-positive (ER+) breast cancer patients, a finding 
that suggests there is a subset of such patients who should 
not be treated with TGF-p pathway antagonists. Our 
approach also revealed novel aspects of TGF-p biology, 
highlighting effects of TGF-p on breast cancer differenti- 
ation and linking ephrin signaling to TGF-p-mediated 
tumor suppression. 

Methods 

Cell lines and reagents 

The MCFlOA-derived cell lines [37,38] were obtained 
from the Barbara Ann Karmanos Cancer Institute Cell 
Line Resource, (Detroit, MI, USA) and were cultured as 
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previously described [25]. MCF10A ('Ml') cells are spon- 
taneously immortalized human breast epithelial cells de- 
rived from a woman with fibrocystic breast disease [39]. 
MCF10ATlk.cl2 (W) was derived from MCF10A by 
transfection with mutant Ha-Ras followed by in vivo 
selection for cells that gave rise to preneoplastic lesions 
and subsequent in vitro cloning from the lesions. 
MCFlOCalh ('M3') and MCFlOCala.cll ('M4') were cul- 
tured from rare tumors that arose from the premalignant 
MCF10ATlk.cl2 cells following implantation in vivo 
[37,38]. Although the MCF10A parental cell line is ER- 
negative, the derivative Ras-transformed cell lines show 
varying degrees of ER positivity and biological responses 
to estrogen in vitro and in vivo [40-43]. In particular, we 
and others have shown that tumors derived from M3 cells 
contain ER+ cells in the differentiated regions ([38] and 
Additional file 1) and thus model ER+ breast cancer. They 
are genetically wild type for p53 but have mutated 
PIK3CA [44], which are also characteristics of human 
ER+ breast cancers [45,46] . Smad2 and Smad3 null condi- 
tionally immortalized mouse mammary epithelial cells 
(IMECs) were generated and cultured as described previ- 
ously [47]. For all in vitro assays, cells were grown to 
50 to 60% confluence and then serum-starved in DMEM/ 
F12 supplemented with Serum Replacement 1 (Sigma- 
Aldrich, St. Louis, MO, USA) for 16 hours prior to TGF-p 
treatment. Recombinant human EphrinAl-Fc chimera 
was purchased from R&D Systems Inc. (Minneapolis, 
MN, USA). 

ShRNA knockdown and gene overexpression 

For gene knockdown experiments, short hairpin RNA 
(shRNA) against target sequences in SMAD2 (5'-GCAC 
TTGCTCTGAAATTTG-3 ') and SMAD3 (5 -GGCCAT 
CACCACGCAG AAC-3'), were cloned into the pLKO.l 
lentiviral vector. shRNA against GFP RNA (5-AAGAC 
CCGCGCCGAGGTGAAG-3 ) was used as a control. 
EFNA1 shRNA lentiviral constructs (TRCN0000007311 
in pLKO.l; V3LHS_360202, V3LHS_360203 in pGIPZ) 
were purchased from Open Biosystems Inc., (Huntsville, 
AL, USA). A dominant negative human type II TGF-p 
receptor (dnTpRII) consisting of amino acids 1-219 of 
TGFBR2 coupled to a Myc or V5 tag was cloned into 
pLPCX retroviral (BD Biosciences Clontech, Palo Alto, 
CA, USA) or pLenti6.2/V5-DEST Gateway lentiviral 
(Invitrogen, Carlsbad, CA, USA) backbones respectively. 
Lentiviral constructs were transfected into the 293FT 
producer cell line using the pPACKHl lentivector pack- 
aging kit (System Biosciences, Mountain View, CA, 
USA) and Lipofectamine 2000 (Invitrogen). Pseudoviral 
particles were isolated by centrifugation and incubated 
with M3 and M4 cells for 24 hours. The cells were 
grown for an additional 48 hours after transduction and 
then maintained under puromycin selection (4 ug/ml) 



for five days. In vivo experiments were performed within 
three to five passages after transduction using pools of 
transduced cells. Knockdown or overexpression were con- 
firmed by RT-QPCR and Western blot analysis. Activity of 
the dnTpRII was confirmed by the ability to block TGF-p- 
inducible Smad2 phosphorylation and TGF-p-inducible 
activity of the Smad3 reporter CAGA12-luciferase, follow- 
ing transient transfection with pGL3(CAGA12)-Luc. 

Western blotting 

Whole-cell extracts were prepared in M-PER (Thermo 
Fisher Scientific Inc., Pittsburgh, PA, USA) containing 
Complete Protease Inhibitor cocktail (Roche Applied 
Science, Indianapolis, IN, USA) and phenylmethylsulfo- 
nyl fluoride (Sigma- Aldrich). Typically, 35 ug of total 
protein was loaded on to 4 to 20% Tris-Glycine SDS- 
PAGE gels (Invitrogen) and transferred to PVDF mem- 
brane (Millipore, Temecula, CA, USA). Antibodies were as 
follows: Smad3 antibody (Ab28379, ChIP grade, Abeam, 
Cambridge, MA, USA), Smad2 antibody (15-1300, Invi- 
trogen), anti-Smad3 phosphoS423/425 antibody (Smad3- 
CP, 1880-1, Epitomics, Burlingame, CA, USA), anti-linker 
phospho S208 Smad3 antibody (Smad3-LP) was gener- 
ously provided by Dr. Fang Liu (Rutgers University, NJ, 
USA), anti-Ephrin-Al antibody (3880-1, Epitomics or 
SC-911, Santa Cruz Biotechnology Inc., Santa Cruz, 
CA, USA), anti-Eck/EphA2, clone7 (05-480, Millipore), 
anti-phosphoEphA2-S897 (035118, US Biological, Salem, 
MA, USA). Anti-p-actin (A-2228, Sigma-Aldrich) was 
used to assess equivalence of protein loading. Peroxidase- 
conjugated secondary antibodies were used at 1:5000 
dilution and the signals were detected by ECL (Thermo 
Scientific Pierce, Rockford, IL, USA). Scanned Western 
blots were quantitated using MultiGauge Analysis Soft- 
ware (Fujifilm, Tokyo, Japan). 

Chromatin immunoprecipitation (ChIP) and methylated 
DNA immunoprecipitation (MeDIP) 

Cells were grown to 50 to 60% confluence in complete 
medium and then switched to DMEM/F12 medium con- 
taining Serum Replacement 1 (Sigma-Aldrich) at a final 
concentration of lx for 16 hours before treatment with 
5 ng/ml of TGF-pl or vehicle for 1 hour. 1 x 10 8 cells 
from vehicle or TGF-p-treated cultures were washed 
once with PBS and then dual cross-linked at room 
temperature successively with 2 mM di-(Af-succinimi- 
dyl) glutarate (DSG, Thermo Scientific) for 30 minutes 
and 1% formaldehyde for 10 minutes. We found the 
DSG cross-linking step significantly increased the re- 
covery of TGF-p-induced Smad3-bound DNA when 
compared with formaldehyde fixation alone. Glycine 
was added to a final concentration of 0.25 M to stop the 
fixation. Cells were washed twice with ice-cold PBS, 
scraped, snap-frozen in liquid nitrogen and stored at -80C 



Sato et al. Breast Cancer Research 2014, 16:R57 
http://breast-cancer-research.eom/content/16/3/R57 



Page 4 of 23 



until further processing. For Smad3 ChIP, cells were resus- 
pended and sonicated in 3 ml of SDS lysis buffer (1% SDS, 
10 mM EDTA, 50 mM Tris-HCl, pH 8.0) to a fragment 
size of 200 to 500 bp on ice using a Misonix™ sonicator 
with the following pulse parameters: time on = 15 seconds, 
time off = 10 seconds, total sonication time for 10 minutes 
at power level = 3 to 4. The lysates were centrifuged at 
15,000 x g for 10 minutes and diluted 10-fold with ChIP 
dilution buffer (0.01% SDS, 16.7 mM Tris-HCl, pH 8.0, 
1.2 mM EDTA, 1.1% Triton™ X-100, 167 mM NaCl) and 
precleared with Dynabead™ Protein A (Invitrogen) for 30 
minutes, followed by incubation with 3 (ig/ml anti-Smad3 
antibody (# 28379, ChIP grade, Abeam) or rabbit im- 
munoglobulin G (IgG) for overnight at 4°C. Protein A 
beads were added and incubated for 1 hour at 4°C, and 
beads were then washed successively with low (0.1% 
SDS, 20 mM Tris-HCl, pH 8.0, 2 mM EDTA, 1% Triton™ 
X-100, 150 mM NaCl) and high (0.1% SDS, 20 mM 
Tris-HCl, pH 8.0, 2 mM EDTA, 1% Triton™ X-100, 500 mM 
NaCl) salt washing buffer, LiCl wash buffer (10 mM 
Tris-HCl, pH 8.0, 1 mM EDTA, 1% NP-40, 1% deoxy- 
cholic acid sodium salt, 0.25 M LiCl) and TE buffer. 
ChlPed DNA was eluted in SDS elution buffer at room 
temperature (RT) with occasional vortexing and the 
cross -linking was reversed by overnight incubation at 
65°C. 

Smad3 ChlPed DNA was amplified using Whole 
Genome Amplification and Reamplification kits (Sigma- 
Aldrich). ChlP-QPCR for known Smad3 target genes 
(JUNB, SERPINE1, SMAD7) confirmed successful amp- 
lification. Amplified ChlPed DNA was biotinylated ac- 
cording to the standard Affymetrix protocol (Affymetrix 
Chromatin Immunoprecipitation Assay Protocol). Follow- 
ing fragmentation, 10 (ig of biotinylated DNA was hybrid- 
ized for 16 hours at 45° C to an Affymetrix promoter tiling 
array (GeneChip™ Human Promoter LOR Array; 25,500 
promoters, 25-mer probes, Affymetrix, Santa Clara, CA, 
USA) according to the manufacturers instructions. Gene- 
chips™ were washed and stained in the Affymetrix Fluidics 
Station 450, and then scanned using an Affymetrix Gene- 
Chip™ Scanner 3000 7G. Data was collected using Affyme- 
trix AGCC software. Each ChIP experiment was conducted 
in quadruplicate (Ml cells and M2 cells) or duplicate (M3 
cells and M4 cells) for independent chromatin isolations. 

Histone H3 with acetylation on lysine residues 9/14 
(H3AcK9/14) has previously been shown to be highly lo- 
calized to the 5' regions of transcriptionally active hu- 
man genes [48] or genes that are poised for transcription 
[49]. To determine the chromatin activation state of se- 
lect Smad3 binding regions, ChIP analysis for acetylated 
histones in untreated Ml to M4 cells grown in complete 
medium was carried out as previously described [50], 
using a ChIP assay kit (Upstate, Temecula, CA, USA), 
with antibodies against H3AcK9/14 (antibody #06-599, 



Upstate). QPCR was performed to determine enrich- 
ment of target genomic regions in the immunoprecipi- 
tated fraction compared with input DNA. Similarly, to 
assess the methylation status of DNA targets, immuno- 
precipitation of methylated DNA (MeDIP) was per- 
formed. 1 \ig whole DNA from all four cell lines (Ml to 
M4 untreated, complete medium) was sonicated in TE 
buffer (Sonicator, Misonix Inc., NY, USA; setting: power 
3, 30 seconds ON, 20 seconds OFF, four times on ice). 
DNA samples were denatured at 99°C for 5 minutes then 
snap-cooled in iced water. Anti-5-methyl cytosine poly- 
clonal antibody (CP51000, rabbit, Megabase Research 
Products, Lincoln, NE, USA) was incubated with the soni- 
cated DNA for 2 hours at 4°C, followed by incubation 
with Dynabead Protein A (Invitrogen) for 1 hour. Super- 
natant was collected as unbound fraction, and beads 
were then washed three times with MeDIP washing 
buffer (10 mM Tris-HCl, 150 mM NaCl, 0.05% Triton 
X-100 and 0.01% BSA) and once with TE buffer. Immu- 
noprecipitated DNA was eluted in SDS elution buffer 
(1% SDS, 10 mM Tris-HCl, pH 8.0, 5 mM EDTA, 300 
mM NaCl), at RT for 10 minutes, and then incubated at 
55°C for 3 hours with proteinase K. DNA was extracted 
using Qiagen Enzymatic reaction clean-up kit (Qiagen, 
Germantown, MD, USA) to give the MeDIP fraction. 
QPCR analysis was performed to determine enrichment 
of genomic regions in MeDIP fractions, normalized to 
unbound DNA. 

ChlP-QPCR and MeDIP QPCR 

Smad3 occupancy at previously known targets as well 
as at select target genes identified by ChlP-chip was 
validated by QPCR of Smad3 ChlPed DNA following 
whole genome amplification, using Power SYBR Green 
PCR master mix (Applied Biosystems, Carlsbad, CA, 
USA) with ABI-PRISM 7900 Sequence Detection Sys- 
tem (Applied Biosystems). The enrichment of ChIP 
DNA was calculated relative to the input DNA using 
gene-specific primer sets to the Smad binding region 
(SBRs) identified from the ChlP-chip analysis, and was 
compared with ChIP for control IgG. Similarly, QPCR of 
MeDIP DNA and of DNA ChlPed for histone H3AcK9/14 
was performed for select SBRs. PPIA was used as a nega- 
tive control for Smad3 ChIP and MeDIP and a positive 
control for H3AcK9/14 ChIP. Conversely, MyoD served 
as a positive control for MeDIP and a negative control for 
H3AcK9/14 ChIP. The primer pairs used for QPCR are 
given in Additional file 2. 

Smad3 ChlP-chip data analysis 

Affymetrix Tiling Analysis Software (TAS, version 1.2.0) 
was used for data processing. Quantile normalization 
was performed within each comparison group; quadru- 
plicate for Ml and M2 cells and duplicate for M3 and 
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M4 cells. Signal or local P value were computed using a 
window size of 200 bp (bandwidth =100 bp), and a 
minimum run of 200 bases with a maximum gap of 100 
bases. Relative enrichment of TGF-p-dependent Smad3 
binding was estimated from the signal difference be- 
tween vehicle-treated and TGF-p-treated samples with a 
false discovery rate (FDR) of 15%. The analyzed data was 
visualized using Integrated Genome Brower [51] for each 
cell line. Enriched regions that overlapped with or were 
within 10 kb upstream or downstream of known pro- 
moter regions (Affymetrix Hs_PromP_NCBlv36.acces- 
sion) were retained. Galaxy web server [52] and UCSC 
Genome Browser tables Chgl8.knownGene') were used 
to annotate enriched regions with target transcripts of 
known genes. Genes were assigned to bound genomic 
regions based on the gene transcription start site (TSS) 
distance to the midpoint of the Smad3 binding region 
(SBR). Both multiple target transcripts and the gene with 
the shortest TSS distance to the region midpoint were 
analyzed. On average, 72% of all enriched regions for 
each of the four cell lines were annotated with known 
genes. 

To generate a probability density plot for the occur- 
rence of SBRs in relation to the closest TSS, binding 
sites within the -7.5 kb to +2.5 kb genomic region for 
the target genes were used. Data for each cell line was 
analyzed independently. The distributions were obtained 
using kernel density estimates [53] implemented in R 
[54]. To determine the distribution of SMAD binding 
sites within the SBRs, a set of high confidence positive 
peaks identified in both M3 and M4 cells was used. After 
merging overlapping peaks, the final set had 190 regions. 
Sequences consisting of peak midpoints +/-1000 bp were 
divided in 100 bp bins. The bins of each peak were scored 
for the presence of one or more SMAD binding site. Data 
for bins equidistant from the midpoint (for example 100 
bp upstream and 100 bp downstream) were combined. 
The fraction of bins with SMAD sites was compared at 
equivalent bin positions for the 190 SBRs and 190 ran- 
domly selected promoter regions that did not show 
Smad3 binding. Two-way ANOVA was performed to ask 
whether the fraction of peaks with a SMAD site differed 
significantly with respect to group (positive vs. negative 
peaks) and distance from peak midpoint. 

To identify enriched transcription factor (TF) binding 
sites within the SBRs, the set of 190 binding regions iden- 
tified above was used. The midpoint of each peak was de- 
termined, and chromosome sequences consisting of the 
midpoint nucleotide flanked by 250 bp were examined for 
the presence of transcription factor binding sites. If neces- 
sary, boundaries of regions were adjusted to eliminate 
overlaps between adjacent sequences. Transcription factor 
binding sites were identified using the Genomatix soft- 
ware suite [55]. Overrepresented TF binding sites within 



peaks were identified using the RegionMiner tool. Co- 
occurrence of TF sites was determined within peak 
midpoint +/-250 bp regions for SMAD and the six 
additional most highly overrepresented TF matrices in 
the 190 binding regions. Matlnspector was used to 
identify TF binding sites in each binding region. Redun- 
dant TF sites were removed (for example a generic 
SMAD site and a SMAD3 site mapped to the same lo- 
cation were counted as one site) and multiple motifs 
for each TF were collapsed down to a single motif (for 
example AP1.01 and AP1.02 were reduced to API). For 
each pair of TF factors, the number of peak sequences 
that contained both sites was determined. P values for 
co-occurrence were calculated using Fishers exact test, 
where TF site co-occurrence in positive peaks were com- 
pared to co-occurrence in the set of 190 control promoter 
regions which showed no Smad3 binding. P values were 
adjusted for multiple comparisons using the Bonferroni 
correction. 

Global gene expression analysis 

In parallel with the Smad3 ChlP-chip analysis, Ml to 
M4 cells were treated in vitro with TGF-p for 1 hour 
and 6 hours and RNA was isolated for gene expression 
analysis using RNeasy™ kit (Qiagen). RNA quality was 
checked on Agilent Bioanalyzer 2100 (Agilent Technolo- 
gies, Santa Clara, CA, USA). All samples used for micro- 
array analysis have high quality score (RIN >9). A total 
of 100 ng of RNA was reverse transcribed and amplified 
using an Ambion WT expression kit following the man- 
ufacturer s instructions. Sense strand cDNA was frag- 
mented and biotinylated using Affymetrix WT Terminal 
Labeling Kit. Three biological replicates for each con- 
dition were hybridized to the Affymetrix GeneChip™ 
Human ST1.0 in a hybridization oven at 45°C, 60 rpm 
for 16 hours. Washing and staining were performed on an 
Affymetrix Fluidics Station 450 using the Affymetrix Gene- 
Chip™ Hybridization Wash and Stain Kit containing R- 
phycoerythrin, strepavidin and biotinylated anti-streptavidin 
antibody, and Genechips were then scanned on an Affyme- 
trix GeneChip™ scanner 3000 7G. Data was collected using 
Affymetrix AGCC software. To assess the regulation of tar- 
get genes by TGF-p in M3 cells in vivo, gene expression 
arrays were also performed for tumor xenografts of M3 
cells transduced with pLPCX retrovirus with no insert 
(M3-CON) or pLPCX expressing a dominant negative 
type II TGF-p receptor (M3-dnTpRII). Six tumors were 
arrayed for each genotype group. All expression arrays 
were normalized by the RMA method using the 
Affymetrix Expression Console. 

RT-QPCR 

RNA prepared by the RNeasy method (Qiagen) from M3 
tumors transduced with lentiviruses expressing shGFP 
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(M3-shCON), shSmad3 (M3-shSmad3) or dnTpRII was 
also analyzed by RT-QPCR. cDNA was synthesized using 
Superscript III (Invitrogen) and RT-QPCR was performed 
using Power SBYR Green PCR Master Mix (Applied Bio- 
systems) and an ABI PRISM™ 7900 HT Sequence Detection 
System (Applied Biosystems). Three biological replicates of 
each tumor type or each in vitro cell treatment condition 
were used for all RT-QPCR validation experiments. Expres- 
sion was normalized to PPIA and fold change in expression 
was calculated relative to the indicated conditions. Primer 
pairs used for RT-QPCR are given in Additional file 3. 

Integration of ChlP-chip and gene expression array 
datasets and signature generation 

To identify the TGF-p/Smad3-dependent gene expres- 
sion signature, Smad3 ChlP-chip data and microarray- 
based gene expression datasets were integrated using 
Partek Genomic Suite 6.5 (Partek, St. Louis, MO, USA). 
Gene expression datasets were log2 transformed and quan- 
tile normalized. ANOVA analysis was then performed on 
the set of Smad3 target genes that had been identified by 
ChlP-chip, to select for TGF-p/Smad3 target genes that 
were differentially expressed, based on FDR <0.05, across 
the four target conditions in vitro (M3 cells -/+TGF-p and 
M4 cells -/+TGF-p). Unsupervised hierarchical cluster 
analysis was performed, in which genes showing the most 
variation in expression across the four treatment groups as 
determined above were computed using Euclidean distance 
metric and average linkage clustering for both 'genes' and 
'groups' and then used to generate heat maps. Smad3 target 
genes that showed significant regulation by TGF-p treat- 
ment in vitro were then assessed for regulation in vivo by 
comparing gene expression array datasets generated from 
M3-CON and M3-dnTpRII tumors (six tumors/genotype 
group) on the Affymetrix Human LOST array platform. 
An FDR cutoff of 0.2 was used to identify differentially 
expressed genes in the in vivo datasets. Despite the low 
stringency cutoff, all 26 genes identified by microarray 
as being uniquely regulated by TGF-p in M3 cells 
in vitro and also regulated by TGF-p in M3 tumors 
in vivo were validated by RT-QPCR (see Results). 

Meta-analysis of gene expression in clinical breast cancer 
datasets 

Meta-analysis of the association of individual genes or 
the gene expression signatures with clinical parameters 
and outcome in human breast cancer array datasets was 
performed using the GSA tumors function in the online 
tool GOBO (gene expression-based outcome for breast 
cancer online) [56,57]. The tumor dataset used for the 
GOBO analyses consists of a total 1,881 samples, with 
the following characteristics: 1,225 ER+ tumors, 395 ER- 
tumors, 927 tumors from patients who received no sys- 
temic therapy (untreated) and 326 tamoxifen-treated 



tumors. All tumors were arrayed on the Affymetrix 
U133A array and came from 11 independent datasets. Im- 
portantly, the GOBO analysis tool allows for directional 
weighting (positive or negative) of component genes of a 
signature. Thus genes that were upregulated by TGF-p 
were assigned a weight of +1, and genes that were down- 
regulated were assigned a weight of -1. Depending on the 
analysis, we used the directional weightings that we deter- 
mined in vitro or in vivo, as indicated in the text. For gene 
sets such as the TGF-p/Smad3 tumor-suppressor signa- 
ture (TSTSS), the program computes an averaged gene set 
expression, including weights, prior to dividing the entire 
dataset into patient cohorts based on gene expression 
quantiles. In our analyses, datasets were dichotomized to 
high (above median) and low (below median) expression 
values for the gene or gene set in question, and Kaplan- 
Meier analysis was performed to determine association of 
gene expression with outcome. Multivariate analyses, and 
analyses of gene expression patterns across clinical groups 
were also performed with the GOBO tool. 

To assess the performance of the TSTSS in independent 
breast cancer gene expression datasets obtained using 
different microarray platforms, the Nederlands Kanker 
Instituut (NKI) dataset using a custom spotted cDNA 
array [58], and the Cancer Genome Atlas (TCGA) data- 
set [59] using a Illumina microarray platform were used 
(Illumina Inc., San Diego, CA, USA). The NKI dataset 
has 337 samples (249 ER+ and 88 ER- tumors) and 19 
out of the 26 genes of the TSTSS were found in this 
dataset, while the TCGA dataset has 525 breast tumors 
(407 ER+ and 118 ER-) and 24 of the 26 TSTSS genes 
were found. We applied the R package survival' [54] to 
estimate the probability of distant metastasis-free sur- 
vival and overall survival in these datasets using the 
Kaplan-Meier method. For each sample, we computed 
the weighted sum of expression of the genes of the 
TSTSS. From these sums, we defined a factor variable 
with value High if the sum was greater than the median 
of the sums, and Low if the sum was not greater than 
the median. The factor variable and the log rank test 
were used to test the difference between the two 
groups. The analysis was applied to all samples, and to 
ER+ and ER- subgroups. To assess the validity of the 
signature, permutation analysis was performed for the 
weight vectors and genes of the TSTSS within the data- 
set GSE6532. In this dataset, expression data were 
found for 22 out of the 26 genes. The number of all pos- 
sible (-1 or 1) binary vectors is 4194304. The Kaplan-Meier 
analyses of the signatures using all possible weight vectors 
showed that the in vivo TSTSS combination outperformed 
97% of all possible combinations. We also tested 10,000 
random subsets of the 22 genes with random binary vectors 
to combine them. The in vivo TSTSS combination outper- 
formed 98% of the 10,000 random subset signatures. 
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Gene expression indices and correlation analyses 

A published metaPCNA index [60] was used as a surro- 
gate for cellular proliferation in the gene expression 
datasets. The metaPCNA index is the median expression 
of the top 1% of genes that were most highly positively 
correlated with the proliferation marker proliferating cell 
nuclear antigen (PCNA) in the GSE2361 dataset, which 
represents gene expression profiles from 36 different 
normal human tissues [61]. We similarly generated a 
metaEphrin index that consisted of the weighted sum of 
the top 30 genes most highly positively correlated with 
EFNA1 in the GSE2361 dataset, using the correlation 
coefficients as weights. We also generated a metaDiff 
index, representing luminal differentiation, that is the 
median expression of a manually curated list of 11 genes 
that were highly expressed in differentiated luminal or lu- 
minal progenitor cells of the normal breast epithelium in 
multiple studies [62-64]. The genes of the metaDiff index 
were: FOXA1, CITED1, GATA3, ESR1, PGR, WNT4, 
KRT19, MUC1, ELF5, KIT, CYP24A1. The Spearman cor- 
relation was performed to determine the relationship be- 
tween the TSTSS and the various indices in individual 
datasets after the removal of outliers. Outliers were de- 
fined as follows: Given a data vector x, let ql and q3 be 
the 1st and 3rd quartiles of x, that is, ql = Q(0.25) the 25% 
quantile and q3 = Q(0.75) the 75% quantile. Define inter- 
quartile range (IQR) = q3-ql. Let m be the median of x. 
Define the interval [a,b] = [m-1.5*IQR, m + L5*IQR]. Any 
data outside this interval are outliers. By this definition, 
fewer than six points were dropped from any given data- 
set. The metaPCNA index was also used as a proliferation 
surrogate in multivariate analysis of the prognostic power 
of the TSTSS in the breast cancer datasets. 

Tumorigenesis 

For tumorigenesis assays, M3 or M4 cells were suspended 
in serum-free DMEM/F12 medium, and 5 x 10 5 cells were 
injected into the #2 and #7 mammary fat pads of six- to 
eight-week-old female athymic NCr nu/nu mice. Tumors 
were measured weekly with calipers and all mice on a 
given experiment were euthanized with C0 2 when the 
first tumor in any experimental group reached 2 cm in 
diameter. All animal studies were done under a protocol 
(LC-070) approved by the National Cancer Institute, in ac- 
cordance with Association for Assessment and Accredit- 
ation of Laboratory Animal Care (AAALAC) Guidelines 
and policies established by the NIH. 

Immunohistochemistry and histopathology 

Formalin-fixed, paraffin-embedded tumor xenografts were 
immunostained with antibodies against cytokeratin 8 (CK8; 
Troma-1, Hybridoma Bank, Iowa, IA, USA), estrogen 
receptor a (ER-a; MC-20, sc-542, Santa Cruz Biotechnol- 
ogy Inc.), angiopoietin-like 4 (#18374, Proteintech Group, 



Chicago, IL, USA) and serpinEl (AF1786, R&D Systems). 
Immune complexes were detected using the Vectastain 
Elite ABC Peroxidase Kit (Vector Labs, Burlingame, 
CA, USA) and the two-component DAB substrate pack 
(Biogenex, San Ramon, CA, USA), as directed by the 
manufacturers. The primary antibody was omitted as a 
negative control. Images were captured using an Axioplan 
Universal microscope (Zeiss, Oberkochen, Germany), and 
immunostaining in five to ten randomly selected high 
power fields (CK8: 20X objective; ER: 40X objective) was 
quantitated using ImagePro Plus Software (MediaCyber- 
netics Inc., Silver Spring, MD, USA). To determine the 
extent of histological differentiation of the tumors, the 
tumor area occupied by well-differentiated glandular- 
like structures was assessed by a pathologist as previ- 
ously described [65]. 

Accession numbers 

The ChlP-chip and in vitro and in vivo gene expression 
microarray data from this publication are available from 
GEO under the accession number Series GSE34277, consist- 
ing of three constituent datasets GSE34270, 34271, 34276. 

Statistics 

Statistical analyses of experimental data were done in 
GraphPad Prism 5.0 (GraphPad Software Inc., San Diego, 
CA, USA) unless otherwise indicated. 

Results 

Smad3 mediates TGF-p-induced tumor suppression in a 
model of breast cancer progression 

To address the role of TGF-p at different stages of 
breast cancer progression, we have used four MCF10A- 
derived cell lines developed by Miller and co-workers 
[37,38], as schematized in Figure 1A. Although the par- 
ental nontumorigenic MCF10A cell line is ER negative, 
the derivative cell lines show varying degrees of ER posi- 
tivity and biological responses to estrogen in vitro and 
in vivo, and have other characteristics of ER+ breast can- 
cer (see Methods). Using a dnTpRII to block TGF-p sig- 
naling in vivo, we previously showed that TGF-p acts as 
a tumor suppressor in M2 and M3 cells, but not in the 
closely related M4 cells where TGF-p now acts as a me- 
tastasis promoter [25]. Thus in the transition from M3 
to M4 the tumor suppressive responses are selectively 
lost, and the model system provides a valuable platform 
for the identification of genes specifically involved in 
TGF-p-driven tumor suppression. 

To identify genes at the core of the tumor suppressor 
program, we decided to focus on direct transcriptional tar- 
gets of TGF-p, reasoning that hierarchically these would 
be the most upstream regulators of the program. In the 
canonical TGF-p signaling pathway, binding of TGF-ps to 
their cell surface receptors leads to phosphorylation and 
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Figure 1 Smad3 mediates tumor suppression by TGF-0 in the MCF10A model of breast cancer progression. (A) Schematic illustration of 
the MCF1 OA-based xenograft model of breast cancer progression. TGF-(3 has tumor-suppressor activity in M3 cells but this effect is lost in M4 cells 
and instead TGF-(3 promotes metastasis. (B) Knockdown of Smad2 and Smad3 protein in M3 and M4 cells was verified by Western blot, quantitated 
relative to the (3-actin loading control and normalized relative to the shGFP condition for each cell line. (C) Relative contributions of Smad2 and Smad3 
to the tumor-suppressive effect of TGF-(3. Mice were orthotopically implanted with M3 cells or M4 cells, genetically modified to stably express shSmad2, 
shSmad3 or the control shGFP, and tumor volumes were assessed after seven weeks (M3) or four weeks (M4). Bars indicate median -(-/—interquartile 
range; P <0.05 was statistically significant, Mann-Whitney U test, ns, not significant. (D) Kinetics of Smad3 phosphorylation. Western blot of total Smad3 
protein and C-terminal phosphorylated Smad3 (Smad3-CP) levels at various time points after TGF-(3 treatment in M1 to 4 cells. Total Smad3 is shown 
for t = 0 h. (E) Western blot showing linker phosphorylated Smad3 (Smad3-LP) at 1 hour after treatment with 2 ng/ml TGF-(3. TGF-(3, transforming 
growth factor beta. 
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activation of the signal-transducing components Smad2 
and Smad3, which translocate to the nucleus and regulate 
gene expression. While TGF-p can signal through non- 
canonical pathways, canonical Smad signaling is thought 
to be central to TGF-p-driven tumor suppression [66]. 
Smad2 and Smad3 are highly homologous, but may play 
non-redundant or even opposing roles in TGF-p signaling 
[67], so we first wished to determine which Smad was 
more important for tumor suppression in the MCF10- 
based breast cancer model. Using shRNA knockdown, we 
found that Smad3 but not Smad2 mediates the TGF-p- 
induced tumor suppressive responses in M3 tumors, and 
this tumor suppressive effect of Smad3 was lost in the 
more malignant M4 cell line (Figure 1B,C). We then 
showed that loss of tumor suppression in M4 was not due 
to changes in Smad3 expression, or duration or extent of 
Smad3 C-terminal phosphorylation (Figure ID), which 
were similar between all four cell lines. Input from other 
signaling pathways can lead to phosphorylation of Smads 
on the middle linker region, resulting in loss of select 
tumor suppressive responses [68]. However, basally high 
levels of linker phosphorylation of Smad3 were already 
evident in the malignant M3 cells, which nevertheless 
still retain their tumor suppressive responses to TGF-p 
(Figure IE), so loss of tumor suppression in M4 cells 
cannot be due to de novo Smad3 linker phosphoryl- 
ation. The level of Smad3 linker phosphorylation in M3 
varied somewhat between experiments but was always 
much greater than in Ml and M2, and was either 
slightly less than or similar to that in M4. Thus we rea- 
soned that events downstream of Smad3 activation are 
primarily responsible for the loss of tumor suppression 
in M4 cells. 

Promoter- wide analysis shows major changes in SmacB 
binding patterns with cancer progression 

We next hypothesized that loss of tumor-suppressive activ- 
ity is caused by changes in the spectrum of TGF-p/Smad3 
target genes with increasing tumor progression. We there- 
fore performed promoter-wide ChlP-chip analysis using a 
Smad3-specific antibody (Figure 2A) to identify TGF-p/ 
Smad3 target genes. This antibody immunoprecipitated 
target DNA from a known Smad3 binding region in the 
Smad7 promoter in Smad3 wild-type but not Smad3 null 
mouse mammary epithelial cells (Figure 2B). In the human 
breast cancer cell lines, we showed that Smad3 binding to 
known target sites in the promoters of the SMAD7, 
COL7A1, and SERPINE1 genes in M3 cells was maximal 
by 1 hour after TGF-p addition, so this time point was se- 
lected for the ChlP-chip analysis (Figure 2C). Parallel gene 
expression studies were performed at both 1 hour and 6 
hours after TGF-p addition. 

Smad2 and Smad3 can be activated by other TGF-p 
superfamily members such as the activins [3], as well as 



by the unrelated kinases Mpsl [69], WNK1 [70] and 
MPK38 [71], and by advanced glycation end products 
[72]. In order to focus specifically on TGF-p-driven 
Smad3 binding, we filtered the ChlP-chip data for loci 
that showed differential Smad3 occupancy between the 
untreated and TGF-p-treated states, rather than just 
analyzing the treated state as has been done previously 
[73-75]. This strategy yielded 498 TGF-p-induced Smad3 
binding regions (SBR) corresponding to 404 annotated 
genes across all four cells (Additional file 4). Representa- 
tive results for IFNK, a novel gene target that only 
shows Smad3 binding in Ml and M2 cells, are shown 
in Figure 2D,E. As expected, the canonical Smad3 bind- 
ing motif GTCT, or its reverse complement AG AC, 
were significantly enriched within the SBRs (Figure 2F), 
though the most enriched transcription factor motifs 
were those of the AP-1 family (Figure 2G), with Smad 
motifs frequently co-occurring with AP-1 family motifs 
in the SBRs (Figure 2H). Enrichment of AP-1 motifs in 
Smad2/3 binding regions was previously also observed 
in keratinocytes [73], and probably reflects the ability of 
Smads to bind directly to the API motif binding site 
through TGF-p-inducible interactions of Smad3 with c- 
Fos, c-Jun or Fral [76,77]. 

Despite the close genetic relatedness of MCF10- 
derived cell lines, relatively few Smad3 target genes (37/ 
404 = 9.2%) were common to all four lines, with the ma- 
lignant M3 and M4 cells showing a particularly high 
proportion of unique targets (Figure 3A and Additional 
file 4). ChlP-QPCR validation at 25 loci in all four cell 
lines was performed that broadly confirmed this unex- 
pected finding (Additional file 5). Furthermore, global 
TGF-p-regulated gene expression showed a similar pat- 
tern, with large numbers of unique gene targets in the 
four cell lines (Additional file 6), confirming that the ob- 
servation of cell-line dependent gene occupancy and 
regulation in response to TGF-p/Smad3 is not due to 
low sensitivity of the ChIP analysis but instead reflects a 
fundamentally important feature of TGF-p biology. To 
determine the basis of this phenomenon, we selected 10 
target genes representing different patterns of TGF-p- 
induced Smad3 occupancy across the four cell lines 
(Figure 3B), and determined whether there were differ- 
ences between the cell lines in local DNA methylation 
and chromatin activation state at the SBRs. We found 
that the target promoters were all hypomethylated in 
all four cell lines (Figure 3C), so occupancy patterns 
could not be explained by differential promoter methy- 
lation. However, TGF-p only induced Smad3 occupancy 
at SBRs in regions of chromatin that were activated 
prior to TGF-p treatment, as assessed by ChIP for the 
presence of H3AcK9/14 at the SBR in the untreated state 
(Figure 3D). Thus the spectrum of TGF-p-induced Smad3 
binding in the different cell lines reflects preexisting local 
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Figure 2 Identification of Smad3 target genes by ChlP-chip in the MCF10A progression series. (A) The anti-Smad3 antibody recognizes a 
unique band in wild type and Smad2 null but not Smad3 null IMECs by Western blot. (B) ChlP-QPCR showing ability of Smad3 antibody (aS3) to 
immunoprecipitate Smad3 bound to the Smod7 promoter in Smad3 wild-type but not Smad3 knockout mouse embryo fibroblasts. CON, isotype- 
matched control antibody. (C) Time course of Smad3 occupancy at promoters of three previously characterized Smad3 target genes assessed by 
ChlP-QPCR following treatment of M3 cells with TGF-(3. (D) Genome browser view (hg 1 8) of Smad3 binding in the promoter of IFNK in M1 to M4 
cells. The signal represents the difference between the TGF-(3-treated and untreated conditions. The threshold represents signal intensity corresponding 
to FDR = 0.1 5. Black rectangles represent regions of significant Smad3 binding. (E) ChlP-QPCR validation of Smad3 occupancy at the IFNK locus. Results 
are mean +/-SD (n = 3) normalized to no TGF-(3 condition. *P <0.05 for enrichment >2-fold. (F) Enrichment of the canonical Smad binding element 
(SBE) in SBRs. The black line represents 190 high confidence SBRs and the grey line represents 190 random promoter regions with no Smad3 
binding. The generic SMAD binding motif is shown. (G) Top 10 enriched transcription factor (TF) matrices within +/-250 bp of the center of 
190 high confidence SBRs. (H) Schematic showing co-occurrence for the most enriched TF motifs. Pairwise analysis of each enriched motif was 
performed using the Fisher's exact test with Bonferroni correction. The adjusted P values for co-occurrence of pairs of TFs are represented by the 
connecting lines: P <1e-7 (purple), P <1e-5 (pink), P <1e-2 (black). ChIP, chromatin immunoprecipitation; FDR, false discovery rate; IMEC, immortalized 
mouse mammary epithelial cells; QPCR, quantitative polymerase chain reaction; SBR, Smad binding region; TGF-(3, transforming growth factor beta. 
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Figure 3 Smad3 binding differs widely across the progression 
series. (A) Petal plot showing the overlap in Smad3 target genes 
between the different cell lines. The genes involved are given in 
Supporting Information Table SI in Additional file 4. (B) Representative 
genes with distinct Smad3 occupancy patterns as confirmed by 
ChlP-QPCR. Closed circles indicate TGF-(3-induced Smad3 occupancy. 
(C) DNA methylation status at promoter regions of each target gene in 
(B) as determined by MeDIP-QPCR. Relative enrichment in the bound 
(MeDIP) vs. unbound fractions is shown. PPIA and MyoD were controls 
for unmethylated and highly methylated DNAs respectively. (D) QPCR 
quantitation of target enrichment following ChIP using anti-H3AcK9/14 
to identify active chromatin. Enrichment at the SBR was calculated 
relative to input DNA. PPIA and MyoD were controls for active and 
inactive promoters respectively. Active chromatin has an enrichment 
value > 1 .00 (indicated by threshold line). ChIP, chromatin 
immunoprecipitation; H3AcK9/14, histone H3 acetylated on lysine 
9 or 14; MeDIP, methylated DNA immunoprecipitation; QPCR, 
quantitative polymerase chain reaction; SBR, Smad binding region; 
TGF-(3, transforming growth factor beta. 



differences in the activated chromatin landscape. This 
observation identifies one molecular mechanism that may 
contribute to the well-known contextuality of TGF-p action. 

Identification of Smad3 target genes that contribute to 
the tumor suppressive effects of TGF-P 

Having identified a set of core TGF-p/Smad3 target 
genes in the breast cancer model system using the ChIP- 
chip approach, we next determined which of these were 
specifically important for the tumor suppressive func- 
tions of TGF-p by assessing gene expression. Microarray 
analysis of gene expression in vitro (using a P value cut- 
off of <0.001 for differential expression between the 
TGF-p-treated and untreated conditions) revealed that 
approximately 50% of the Smad3 occupied genes in each 
of the cell lines showed altered gene expression within 6 
hours after TGF-p treatment (Additional file 7). Since 
the tumor-suppressive effect of TGF-p is lost on pro- 
gression from M3 to M4 cells, we focused our subse- 
quent analysis on these two lines in order to develop a 
TGF-b/Smad3 tumor suppressor signature (TSTSS), as 
schematized in Figure 4. First, we compared patterns of 
TGF-p-regulated expression of Smad3 target genes 
in vitro, reasoning that genes that were regulated in M3 
only should be enriched for Smad3-driven tumor- 
suppressive responses. Unsupervised hierarchical cluster- 
ing of genes that were heterogeneously expressed across 
the four conditions (Figure 5A) revealed sets of Smad3 
target genes that were (a) uniquely upregulated by TGF- 
p in M3 (Cluster I: 38 genes total); (b) similarly regu- 
lated by TGF-p in M3 and M4 (Clusters III and IV: 65 
genes total); and (c) not regulated by TGF-p but basally 
different between the two cell lines; Clusters II and V). 
No distinct clusters of Smad3 target genes that were 
uniquely regulated in M4 cells were identified. 

TGF-p effects are highly context dependent, and the 
in vivo microenvironment provides a different set of 
contextual cues that could affect gene expression. To 
ask whether the 38 Smad3 target genes that were 
uniquely upregulated by TGF-p in M3 cells in vitro were 
also regulated by TGF-p in vivo, we next analyzed gene 
expression in M3 tumors with and without TGF-p path- 
way ablation using a dnTpRII [65]. Microarray analysis 
of these tumors showed that 26/38 (77%) of the M3 
unique targets were also regulated by TGF-p in vivo, a 
finding that was confirmed by RT-QPCR (Figure 5B). 
Unexpectedly, nearly 25% of these genes were regulated 
in the opposite direction by TGF-p in vitro and in vivo, 
including the hallmark TGF-p response genes SERPINE1 
and ANGPTL4 (Figure 5B). Since this was a surprising 
result, we then wished to determine whether the dis- 
cordant in vitro/in vivo results reflected an involvement 
of alternative TGF-p signaling pathways other than 
Smad3 in regulation of the discrepant genes in the 
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Figure 4 Strategy for integration of ChlP-chip and gene expression datasets to generate a core TGF-p/Smad3 tumor suppressor signature. 

The experimental strategy for identification of the TGF-(3/Smad3 tumor suppressor signature (TSTSS) is shown. ChIP, chromatin immunoprecipitation; 
TGF-p, transforming growth factor beta. 



in vivo setting. We took representative genes that were 
concordantly (ANXA2, PTPN3, SERPINE2) or discord- 
antly (ANGPTL4, KLF7, SERPINE1) regulated between 
the in vitro and in vivo conditions and analyzed expres- 
sion of the same genes in M3 tumors with and without 
Smad3 knockdown. The results were consistent with a 
role for Smad3 in the in vivo setting as well as in vitro 
(Figure 5C). Thus TGF-p signaling through Smad3 can 
upregulate or repress the same target genes in a given 
cell line depending on the local microenvironmental 
context (in vitro vs. in vivo). To confirm that the dis- 
crepancy in direction of target gene regulation in vitro 
and in vivo was not due to major contributions from the 
stroma in the tumors in vivo, we immunostained M3 tu- 
mors for ANGPTL4 and SERPINE1 and showed that 
these proteins were expressed predominantly in the tumor 
parenchyma (Additional file 8). The data emphasize the 
critical importance of including an in vivo gene expression 
filter in this type of approach, since clearly the directionality 



of Smad3 target gene regulation in vivo cannot reliably be 
extrapolated from in vitro results. 

TGF-p/Smad3 target genes associated with tumor 
suppression predict good clinical outcome in human 
breast cancer datasets 

Taking the core list of 26 genes that survived the 
in vivo filter, we next asked whether this TSTSS was as- 
sociated with clinical outcome in human breast cancer 
datasets. Using the GOBO software [56], which allows a 
directional weighting (upregulated vs. downregulated) 
to be assigned to individual genes in a gene set, we per- 
formed a meta-analysis of the TSTSS in eight clinical 
breast cancer gene expression datasets that used the 
Affymetrix array platform. Twenty out of the 26 genes 
of the TSTSS were represented in the GOBO clinical data- 
sets (Additional file 9A). High expression of the TSTSS 
was strongly associated with better distant metastasis-free 
survival (DMFS) (P = 0.00001) in datasets representing 
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Figure 5 Generation of core TGF-p/Smad3 tumor suppressor signature. (A) Unsupervised hierarchical clustering of differentially expressed 
Smad3 target genes in M3 and M4 cells treated with TGF-p in vitro for 6 hours. The 38 genes induced by TGF-p in vitro in M3 only were taken forward 
for further analysis. (B) RTQ-PCR validation of the 26 genes out of the original 38 genes that microarray analysis showed to be also regulated by TGF-p 
in M3 tumors in vivo. M3 tumors transduced with the dnT|3RII represents the low TGF-p signal condition in vivo, while M3 tumors transduced 
with control lentivirus represents the high TGF-p signal condition in vivo. Expression was normalized to the low signal condition for each gene. 
Results are the mean +/-SEM for three tumors/experimental group. The difference between the high and lowTGF-p signaling conditions is 
statistically significant (P <0.05; unpaired t test) for all genes shown. Note the six genes, marked by arrows, which are downregulated by TGF-p 
in vivo whereas they were upregulated in vitro. (C) Smad3 dependence of TGF-p regulation of select target genes in vivo. Further RT-QPCR 
quantitation was performed for six representative genes under the following conditions: (i) M3 cells treated with 5 ng/ml TGF-p (high TGF-p signal 
condition) or vehicle (low signal condition) in vitro; (ii) M3 tumors in vivo following transduction with a dnT(3RII to block all TGF-p responses 
(low signal condition) or LacZ control lentivirus (high signal condition); (iii) M3 tumors in vivo following transduction with shSmad3 to block 
Smad3-mediated responses (low signal condition) or shGFP control lentivirus (high signal condition). Results are mean +/-SEM for three to six 
independent samples/group, normalized to low signaling condition, "statistically significant {P <0.05) for high vs. low signaling condition, 
unpaired t test. dnTPRII, dominant-negative type II TGF-p receptor; TGF-p, transforming growth factor beta. 



1,379 breast cancers when the in vivo directional 
weighting for expression of the signature genes was 
used (Figure 6A). Notably, the prognostic power of the 
signature was greatly decreased (P = 0.05) if weighting was 
used corresponding to the in vitro rather than in vivo 



direction of gene regulation (Figure 6B), thus demonstrat- 
ing the utility of our integrated in vitro/ in vivo approach. 

Permutation analysis in the GSE6532 dataset showed 
that the in vivo TSTSS combination outperformed 97% 
of all possible combinations of the binary weight vectors, 
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Figure 6 Meta-analyses correlating TGF-p/Smad3 target genes with outcome in human breast cancer datasets. Kaplan-Meier analyses 
were performed using the online GOBO tool to assess the association of the TGF-|3-regulated gene sets with distant metastasis-free survival (DMFS) in 
meta-analyses across multiple breast cancer cohorts (1,379 tumors from eight datasets). Patient datasets were dichotomized to higher than median 
expression (black) or lower than median expression (grey) of the gene set. P values were determined by the log-rank test. (A,B) Kaplan-Meier plots for 
survival of all patients in the GOBO datasets, using the set of TGF-|3/Smad3 target genes that were uniquely regulated in M3 (the TSTSS). This gene set 
was designed to be enriched in genes involved in tumor suppression. Weighting (positive or negative) of individual target genes is based on 
the directionality of TGF-|3-regulated gene expression observed in M3 tumors in vivo (A) or in M3 cells in vitro (B) as indicated. (CD) Correlation of the 
TSTSS (using the in vivo directional weighting) with DMFS in ER+ (n = 856) (C), and ER- (n = 320) (D) patient subsets of the GOBO cohorts, ns, 
not significant. ER, estrogen receptor; GOBO, gene expression-based outcome for breast cancer online; TGF-(3, transforming growth factor beta; 
TSS, transcriptional start site; TSTSS, TGF-(3/Smad3 tumor suppressor signature. 



and 98% of 10,000 random subsets of TSTSS genes and 
random binary weight vectors (see Methods for more de- 
tails). Furthermore, the prognostic power of the in vitro I 
in vivo concordant and discordant gene sets when ana- 
lyzed separately was much lower than that of the full 
TSTSS (Additional file 10). Using an identical strategy to 
that used to generate the TSTSS, we derived a 'generic' 
TGF-p signature from the 65 Smad3 target genes that were 
regulated by TGF-p in both M3 and M4 in vitro, of which 
24 genes survived the in vivo filter (Additional file 9B). As 
expected, this signature performed much less well than did 
the TSTSS (Additional file 11), which highlights the import- 
ance of enriching the signature for genes functionally 
associated with tumor suppression in order to make the 



TGF-p-driven tumor suppressive signal detectable in clinical 
samples. It should be noted that TGF-p also functions as a 
tumor suppressor in premalignant M2 cells [25], but the 
TSTSS is not evident in M2 cells (not shown). We believe 
this is because TGF-p likely suppresses the premalignant-to- 
malignant transition (M2) and tumor progression (M3) by 
different mechanisms. Since the patient datasets represent 
tumors from later stages in progression, here we focused 
specifically on the M3 tumor-suppressor signature. 

Consistent with our use of an ER+ model to generate 
the signature and our observations of the strong contex- 
tuality of TGF-P-regulated gene expression, the prognostic 
power of the in v/vo-weighted TSTSS was restricted to the 
ER+ tumors only in the GOBO datasets (Figure 6C,D). In 
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multivariate analysis of ER+ patients in the GOBO cohort, 
lower than median expression of the TSTSS was associ- 
ated with increased risk of distant metastasis (hazard 
ratio = 1.85, confidence interval = 1.28 to 2.67; P = 0.001; 
n = 553 evaluable patients), independent of lymph node 
status, tumor grade, age at diagnosis and tumor size. The 
signature also prognosticated in three independent co- 
horts of ER+ breast cancer patients: the NKI cohort using 
custom spotted cDNA arrays [58], and the TCGA cohort 
[78] and BT2000/Metabric cohorts [79] both using Illu- 
mina arrays (Additional file 12). Thus performance of the 
signature is robust across different datasets and array plat- 
forms. Our data suggest that TGF-p/Smad3-mediated 
tumor-suppression plays an important role in the natural 
history of ER+ breast cancer, and that tumor-suppressive 
effects of TGF-p are still evident in the tumor at the time 
of surgery and influencing disease outcome for a signifi- 
cant fraction of patients. 

TGF-p/Smad3 effects on tumor cell proliferation and 
differentiation in breast cancer 

We next asked what biological activities might underlie 
the tumor-suppressive effects of TGF-p/Smad3 in breast 
cancer. Inhibition of epithelial cell proliferation is a hall- 
mark activity of TGF-p [3], and we have previously 
shown that TGF-p strongly inhibits proliferation of M3 
cells but not M4 cells in vitro [25]. Furthermore, the 
prognostic power of many breast cancer signatures is 
driven by proliferation [80]. To address the relationship 
between the TSTSS and proliferation in the clinical 
breast cancer datasets, we used a meta-PCNA index as a 
surrogate for proliferation [60]. There was a weak but 
highly statistically significant negative correlation be- 
tween the metaPCNA index and the TSTSS in ER+ 
breast tumors but not ER- tumors in multiple independ- 
ent cohorts. Results for the TCGA cohort are shown in 
Figure 7A. Similar results were obtained for ER+ tu- 
mors in the GSE6532/Loi and NKI cohorts (Additional 
file 13). These results suggest that antiproliferative ef- 
fects of TGF-p are still active in ER+ tumors. However, 
the TSTSS still prognosticated independently of prolif- 
eration in multiple independent datasets by multivariate 
analysis (Table 1), suggesting that the tumor-suppressive 
effects of TGF-p in ER+ breast cancer must also involve 
additional biological mechanisms. 

In ER+ tumors, GOBO meta-analysis showed that 
the TSTSS was inversely correlated with tumor grade 
(P = 0.00001; Figure 7B), which in part reflects histo- 
logic differentiation [81]. A total of 7/26 genes of the 
TSTSS were annotated for involvement in cellular dif- 
ferentiation (ANXA2, CTGF, EFNA1, ITGA4, LAMB3, 
PTPN11, SPAG9), and we also found that the TSTSS was 
weakly positively correlated in ER+ but not ER- tumors 
with a meta-differentiation index (see Methods for 



definition) that reflects luminal differentiation (Figure 7C). 
To demonstrate a causal role for TGF-p/Smad3 in regulat- 
ing breast cancer differentiation, we showed that knock- 
down of Smad3, but not Smad2, was associated with 
reduced development of well-differentiated glandular-like 
structures in M3 tumors (Figure 7D,E), and a significant 
reduction in expression of the differentiated luminal 
markers cytokeratin 8 (CK8) and ER (Figure 7D,F). Thus 
the tumor-suppressive effects of TGF-p in ER+ breast can- 
cer include a role in enhancing cellular differentiation, and 
Smad3 is a critical mediator of this activity. 

Ephrin signaling contributes to tumor-suppressive effects 
of TGF-P in ER+ breast cancer 

To begin to explore molecular mechanisms underlying 
TGF-p-driven tumor suppression, we performed Ingenuity 
Pathway Analysis on the signature genes. Network analysis 
was relatively uninformative for this small number of 
genes and did not provide any useful leads other than in- 
dicating that a number of the signature genes were subject 
to regulation by ubiquitination (Additional file 14). How- 
ever, pathway analysis identified ephrin receptor sig- 
naling as the most enriched pathway in the TSTSS 
(Figure 8A), with the genes involved being EFNA1, 
ITGA4, LIMK2 and PTPN11. The ephrin ligands and 
ephrin receptors are a large family of membrane-bound 
proteins that signal bidirectionally in a cell-contact- 
dependent manner. Ephrin-Al (EFNA1) ligand binding to 
the EphA2 receptor at sites of cell-cell contact maintains 
epithelial phenotype and integrity in part by downregulat- 
ing Akt, Rho/Rac and Ras pathway signaling [82-84]. In 
contrast, unligated EphA2 becomes phosphorylated by 
Akt on Ser897. This phosphorylation event results in 
loss of suppressive effects on the Ras/Erk pathway and 
enhanced pro-oncogenic signaling through Akt and 
Racl, ultimately leading to increased cell migration, in- 
vasion, proliferation and survival [84]. Thus depending 
on the balance of ligand and receptor, ephrin pathway 
signaling can be associated with pro-oncogenic or anti- 
oncogenic outcomes. 

To investigate the interrelationship between ephrin 
signaling and TGF-p, we generated a metaEphrin index 
consisting of the top 30 genes most highly correlated with 
EFNA1 mRNA in normal human tissues, and we showed 
that this index was strongly positively correlated with the 
TSTSS in ER+ breast cancer datasets (see Figure 8B for 
the TCGA cohort; similar results were also seen for the 
GSE6532 and NKI cohorts in Additional file 13). The 
index was also weakly associated with good outcome in 
ER+ breast cancers by GOBO meta-analysis (Figure 8C), 
suggesting that ephrin pathway signaling might contribute 
to the tumor-suppressive effects of TGF-p in ER+ breast 
cancer. Using ChlP-QPCR, we confirmed that TGF-p 
induced Smad3 occupancy at the EFNA1 promoter to a 
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Figure 7 Loss of TGF-0 signaling leads to reduced tumor differentiation. (A) The TSTSS is weakly anticorrelated with proliferation, as assessed 
using a metaPCNA index, in ER+ but not ER- breast cancers (JCGA cohort). (B) Expression of the TSTSS inversely correlates with tumor grade in ER+ 
breast cancer (n = 904 patients), as assessed using GOBO tool. (C) Correlation between the TSTSS and luminal differentiation, as assessed using 
a meta-differentiation index, in ER+and ER- tumors in the TCGA cohort. (D) Immunohistochemical staining of cytokeratin 8 (CK8) and ER-a in 
primary tumors from M3 cells expressing shGFP, shSmad2 or shSmad3. Scale bars represent 100 urn. (E) Quantitation of % area occupied by 
structures with a differentiated glandular histology in M3 tumors expressing shGFP (control), shSmad2 or ShSmad3. Results are mean +/-SEM 
for five tumors/group. (F) Quantitation of CK8 and ER staining was performed using Image-Pro Plus software. Each datapoint represents the 
mean of five fields/tumor, and results are shown as mean +/-SEM for five tumors/group. *P <0.05 for one-way ANOVA with Dunnett's multiple 
comparison test; ns, not significant; hpf, high power field. ER, estrogen receptor; GOBO, gene expression-based outcome for breast cancer 
online; TGF-(3, transforming growth factor beta; TSTSS, TGF-|3/Smad3 tumor suppressor signature. 



much greater extent in M3 than M4 cells (Figure 8D), 
reflecting a more active state of the chromatin at the 
EFNA1 promoter in M3 than M4 cells under basal condi- 
tions (Figure 8E). As expected, EFNA1 mRNA expression 



was upregulated by TGF-p treatment in M3 and not M4 
cells (Figure 8F). Upregulation of Ephrin-Al protein by 
TGF-p in M3 cells in vitro was confirmed by Western 
blot and was associated with a corresponding decrease 
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Table 1 Prognostic power of the TSTSS in three independent ER+ breast cancer datasets 

Breast cancer cohort 



Variable 


Ref for HR = 1 


GSE6532 (Loi) 






NKI 






TCGA 




Hazard ratio CI 


P value 


Hazard ratio 


CI 


P value 


Hazard ratio 


CI 


P value 


TSTSS 


High 


1.89 1.1-3.27 


0.02211 


1.81 


1.12-2.93 


0.01535 


2.6 


1.21-5.58 


0.0142 


metaPCNA 


Low 


2.32 1 .27-4.24 


0.006251 


1.67 


0.96-2.92 


0.07143 


0.77 


0.38-1.58 


0.4788 


Age 


Low* 


1.03 0.63-1.69 


0.9 


0.52 


0.32-0.84 


0.007841 


1.88 


0.94-3.75 


0.07393 


Node 


Negative 


0.91 0.54-1.53 


0.7113 


0.84 


0.52-1.34 


0.4651 


2.33 


1.07-5.08 


0.03389 


Grade 


Grade 1 


0.99 0.67-1.47 


0.9798 


1.57 


1.1-2.25 


0.01365 


na 


na 


na 


Size 


Continuous 


1.33 1.09-1.62 


0.005177 


1.22 


0.95-1.57 


0.1109 


na 


na 


na 



Multivariate analysis was performed using the Cox proportional hazards model. Nineteen out of the 26 genes of the TSTSS were found in the GSE6532 (Loi) and 
NKI cohorts, and 24 out of 26 genes were found in the TCGA cohort. The TSTSS was analyzed as a binary variable where high indicates higher than median 
expression. The hazard ratio refers to distant metastasis-free survival for GSE6532 and NKI, and overall survival for TCGA. The metaPCNA index is a surrogate for 
proliferation and low indicates lower than median expression. Parameters for the TSTSS are highlighted in bold. *Low is <61 yrs (GSE6532); <45 yrs (NKI); <60 yrs 
(TCGA). HR, hazard ratio; CI, confidence interval; TSTSS, TGF-p/Smad3 tumor suppressor signature; na, not available. 



in pro-oncogenic phosphorylation of the receptor EphA2 
on Ser897 (Figure 8G). Thus by upregulating Ephrin-Al, 
TGF-p treatment can repress pro-oncogenic signaling 
through the ephrin pathway in M3 cells. To demonstrate 
a functional role for Ephrin-Al in tumor suppression, 
we used shRNA knockdown in M3 cells (Figure 8H) 
and showed that ephrin-Al knockdown significantly in- 
creased primary tumorigenesis (Figure 81). Thus activa- 
tion of Ephrin-Al signaling by TGF-p may contribute 
to the tumor-suppressive effects in ER+ breast cancer. 

Discussion 

The highly pleiotropic nature of the TGF-p signaling path- 
way creates significant hurdles that must be overcome be- 
fore the complex roles of this pathway in tumorigenesis 
can be understood and therapeutically exploited in breast 
cancer. Our integrated genomic approach is the first that 
specifically isolates the tumor-suppressive gene expression 
responses to TGF-p from the tumor-promoting or 
tumor-unrelated responses, and our results suggest that 
TGF-p-driven tumor-suppressive responses contribute 
importantly to good clinical outcome in a subset of pa- 
tients with ER+ breast cancer. A number of interesting 
features of TGF-p tumor biology were uncovered in 
this study and the broader implications are discussed 
further below. 

Previous genome-wide ChIP studies have shown con- 
siderable differences in Smad target genes between cell 
lines of different origins [73,74,85]. However, here we 
found that the spectrum of Smad3 target genes differed 
to a surprising extent between four highly related breast- 
derived cell lines, the parental MCF10A line (Ml) and 
its three premalignant or malignant derivatives (M2, M3, 
M4), despite their common origin and identical test con- 
ditions. Furthermore, we showed that TGF-p-induced 
Smad3 binding occurred in regions of chromatin that 
were already activated prior to TGF-p treatment, sug- 
gesting that the spectrum of Smad3 binding is highly 



sensitive to preexisting local differences in the epigenetic 
landscape. This feature of Smad3 effector activity may 
contribute significantly to the known contextuality of 
TGF-p action, and raises the possibility that TGF-p- 
induced Smad3 promoter occupancy mostly serves to 
fine-tune ongoing transcriptional programs in this model 
rather than initiating new programs. Consistent with this 
hypothesis, it was recently shown that during embryonic 
development Smad3 binds primarily to promoter regions 
that are already occupied by the master transcription 
factors for the developmental stage or lineage [86]. 

Our novel observation that the direction of regulation 
of Smad3 target genes by TGF-p can differ in vitro and 
in vivo identifies an additional new facet of TGF-p 
contextuality that is highlighted by the effects of TGF-p 
on ANGPTL4. ANGPTL4 was previously identified as a 
metastasis-promoting gene that was upregulated by 
TGF-p in MDA-MB-231 cells, an ER-negative breast 
cancer model in which TGF-p has lost its tumor- 
suppressive activity and instead promotes progression 
[34]. While ANGPTL4 was upregulated by TGF-p in 
both MDA-MB-231 and M3 cells in vitro, we found 
that ANGPTL4 was actually downregulated by TGF-p/ 
Smad3 in vivo in the M3 tumors where TGF-p func- 
tions as a tumor suppressor. This is in contrast to the 
upregulation in vivo of ANGPTL4 in MDA-MB-231 
tumors where TGF-p functions as a pro-progression 
factor. Thus the in vitro condition serves to identify 
TGF-p target genes, but it does not indicate the direc- 
tion of their regulation in the in vivo context and hence 
cannot predict the critical issue of biological outcome 
(tumor suppression vs. tumor progression). Identifying 
the factor that causes the direction of expression of cer- 
tain Smad3 target genes to flip in vivo will be an inter- 
esting challenge for the future. 

Our integrated genomic strategy allowed us to dissect 
out a core TGF-p/Smad3 gene signature that specific- 
ally reflected the tumor-suppressive activities of TGF-p 



Sato et al. Breast Cancer Research 2014, 16:R57 
http://breast-cancer-research.eom/content/16/3/R57 



Page 18 of 23 



B 



Corr = 0.461; p = 7.73e-23 



Ephrin Receptor Signaling 
Role of Tissue Factor in Cancer' 
RhoGDI Signaling 
Ephrin A Signaling 
Axonal Guidance Signaling' 
Signaling by Rho Family GTPaseS' 
Caveolar-mediated Endocytosis Signaling' 




LU 

03 




CO 



100 

80' 
60' 
40. 
20 
0 



p=0.03 

Low (n=490) 
— High (n=366) 



-log (B-H p value) 



-5 0 5 
TSTSS 



2 4 6 8 10 
Time (Years) 



25 

I &15 
o c 

c B: 10 

CD Z 

o — ° 
^ 0 



■ Con 

□ +TGF-P 




M3 



M4 



H 



EFNA1 

pEphA2-S897 

EphA2 

(3-actin 



M3 M4 
Cell line 




15 20 25 



| EFNA1 
| p-actin 



CO 


1500 


E 




E 


1000 


o 




> 


500 


b 


E 




13 
h- 


0 



■•■ Parent 
jf«.shGFP 

-e-shEFNA202 
■A-shEFNA203 




30 40 
Time (days) 



Figure 8 Ephrin signaling contributes to the tumor suppressive effects of TGF-0 in ER+ breast cancer. (A) Pathway enrichment in the TSTSS 
assessed by Ingenuity Pathway Analysis. Fisher's exact test with Benjamini-Hochberg (B-H) correction. The dotted line represents the P <0.05 significance 
threshold. (B) Correlation of TSTSS with meta-EFNA index in ER+ tumors of the TCGA cohort. (C) Association of meta-EFNA index with distant 
metastasis-free survival (DMFS) in ER+ breast cancer (n = 856 patients) using the GOBO tool. (D) Smad3 ChlP-QPCR at the EFNA1 locus in M3 
and M4 cells. (E) ChlP-QPCR for H3AcK9/14 at the EFNA1 locus to identify active chromatin. (F) Time course of EFNA1 mRNA induction by 
TGF-(3 (2 ng/ml) in M3 and M4 cells. Results are mean +/-SEM of three determinations. *P <0.05. (G) Western blot of effect of TGF-(3 treatment 
of M3 cells on Ephrin A1 (EFNA1) expression and oncogenic signaling through phosphorylation of the EphA2 receptor on S897. EFNA-Fc was 
used as a positive control for activation of the EphA2 signaling path. (H) Western blot showing knockdown of EFNA1 in M3 cells. (I) Knockdown of 
EFNA enhances tumorigenesis in M3 cells, n = 8 to 10 mice/group. *P >0.05 one-way ANOVA, Tukey's multiple comparison test. ChIP, chromatin 
immunoprecipitation; ER, estrogen receptor; GOBO, Gene expression-based Outcome for Breast cancer Online; H3AcK9/14, histone H3 acetylated 
on lysine 9 or 14; QPCR, quantitative polymerase chain reaction; TGF-(3, transforming growth factor beta; TSTSS, TGF-|3/Smad3 tumor suppressor signature. 



in vivo and was associated with good outcome in mul- 
tiple independent ER+ breast cancer cohorts. In multivari- 
ate analysis, below-median expression of the signature 
was associated with a two-fold increased hazard ratio for 
development of distant metastases. This finding strongly 
suggests that the tumor suppressive effects of TGF-p per- 
sist and limit progression in a significant fraction of 
breast cancers at the time of clinical intervention. Im- 
portantly, our TGF-p/Smad3 signature only prognosti- 
cated well when the signature genes were weighted for 
the direction of regulation that was seen in vivo and 
not in vitro. Thus two key features of our approach were 
critical for identifying a discernable tumor suppressor 



signal for TGF-p in the clinical datasets. One was the use 
of closely related breast cancer cell lines with and without 
an intact tumor- suppressive response so that TGF-p- 
regulated genes that were specifically involved in tumor 
suppression could be readily identified, and the other was 
the coupling of the in vitro discovery steps with in vivo 
validation. Prognostic TGF-p signatures have previously 
been generated through strategies that did not explicitly 
separate the different activities of TGF-p [11,32,34,36]. Al- 
most universally, the signatures are associated with poor 
outcome in breast cancer patients, suggesting that they 
primarily capture the pro-oncogenic effects of TGF-p 
[18,32,34,36]. In the one exception, a TGF-p signature 
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reflecting basal gene expression differences between wild- 
type and TGFBR2 knockout mouse mammary tumor 
cells weakly correlated with good outcome in ER+ 
breast cancers [11]. However, there is little overlap be- 
tween this signature and our own, suggesting that the 
different approaches used are capturing complementary 
aspects of the underlying TGF-p biology. Thus the 
TGFBR2-based signature implicates TGF-p in the 
suppression of local inflammation in the tumor micro- 
environment [11], whereas ours highlights tumor 
cell-autonomous effects of TGF-p on tumor cell prolifera- 
tion and differentiation (see later). 

We found that high expression of our TGF-p/Smad3 
tumor-suppressor signature was associated with good 
outcome only in patients with ER+ tumors, suggesting 
that TGF-p plays a particularly important role in limiting 
progression of this breast cancer subtype. Consistent 
with these results, TGF-pl was previously identified as a 
differentially expressed hub in gene expression networks 
derived from normal luminal ER+ breast epithelial cells, 
but not from ER- cells [87], and TGF-p was shown to re- 
strain the proliferation of ER+ mammary cells in normal 
mice [88] . Interestingly however, we found higher overall 
expression of the signature genes in ER- breast cancer 
(Additional file 15), although no association with clinical 
outcome was observed in this tumor subtype. This ob- 
servation suggests that that the TGF-p/Smad3 tumor- 
suppressor program, while still detectable, has been 
functionally overridden or subverted by the oncogenic 
pathways that are activated in ER- tumors. It is currently 
not clear whether TGF-p has distinct tumor-suppressive 
effects in ER- tumors that are not captured by our signa- 
ture, since our signature was developed using an ER+ 
model. However, given our demonstration of the strong 
context dependence of Smad3 binding and thus of TGF- 
p effects, we believe that there may not be a universal 
signature for TGF-p-driven tumor suppression. Thus the 
very important goal of developing gene signature-based 
predictive biomarkers for patient inclusion/exclusion 
from clinical trials with TGF-p antagonists is a challen- 
ging one, and our data suggest that such signatures will 
likely have to be tailored to the specific tumor subtype. 
Based on our current findings, we propose that patients 
with ER+ tumors and high signature expression would 
not be good candidates for TGF-p antagonist therapy, 
but that the mere presence of this particular TGF-p tumor- 
suppressor signature in ER- breast cancer patients would 
not necessarily be a contraindication for such therapy. 

Further analysis of our signature gave insights into 
the mechanisms of tumor suppression by TGF-p in 
breast cancer. High expression of the signature was in- 
versely correlated with proliferation index and tumor 
grade, suggesting that the known antiproliferative and 
differentiation-promoting effects of TGF-p do contribute 



to tumor suppression in human breast cancer. We had 
previously shown that TGF-p can induce differentiation 
in this breast cancer model [65], and here we demon- 
strated that this effect is mediated by Smad3. However, 
since the signature still prognosticated independently of 
proliferation and tumor grade in multivariate analysis, 
there are likely to be additional as yet unidentified bio- 
logical activities that also contribute to the TGF-p-driven 
tumor suppression. In terms of molecular mechanism, our 
signature implicated Ephrin-Al as a novel downstream 
mediator contributing to TGF-p-driven tumor suppres- 
sion. Like TGF-p signaling, ephrin signaling can have 
pro-oncogenic or anti-oncogenic effects, depending on 
the relative levels of ephrin ligands and receptors, and 
the nature of the target cell [82-84]. In breast cancer 
model systems, Ephrin-Al signaling through the EphA2 
receptor on the tumor cell can inhibit tumorigenesis 
[89,90], whereas excess unliganded EphA2 promotes 
tumorigenesis through enhanced proliferation and migra- 
tion [90,91]. However, tumor-derived Ephrin-Al ligand 
can also have pro-oncogenic effects by promoting tumor 
angiogenesis through the stimulation of EphA2 signaling 
on endothelial cells [92]. Thus, as for TGF-p, a complex 
balance of biological activities is at play. Our bioinformatic 
data revealed a strong statistical relationship between 
TGF-p tumor suppression and ephrin signaling in ER+ 
breast cancer datasets, and we demonstrated experimen- 
tally that tumor-autonomous ephrin signaling suppresses 
tumorigenesis in the M3 breast cancer model. Thus en- 
hanced ephrin signaling plausibly contributes to tumor 
suppression by TGF-p in ER+ breast cancer. 

Conclusions 

We have generated a TGF-p/Smad3-driven gene expression 
signature that specifically captures the tumor-suppressive 
effects of TGF-p in ER+ breast cancer. High expression 
of this signature was associated with good clinical out- 
come in multiple ER+ breast cancer cohorts, suggesting 
that tumor-suppressive effects of TGF-p are still active 
and slowing disease progression at the time of surgery in a 
significant fraction of breast cancer patients. Clearly such 
patients should be excluded from treatment with thera- 
peutic TGF-p antagonists. At a molecular level of reso- 
lution, we found that cellular responses to TGF-p are even 
more sensitive to contextual cues than was previously ap- 
preciated, which suggests that distinct TGF-p signatures 
may have to be generated for different tumor types or sub- 
types. However, using integrated in vitro/in vivo strategies 
such as ours, it is clearly possible to assess whether the 
good or the bad sides of TGF-p dominate in determining 
disease outcome. Such information will set the stage for 
safer and more effective therapeutic exploitation of this 
important signaling pathway in cancer. 
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Additional files 



Additional file 1: Immunostaining for estrogen receptor in tumors 
from M3 and M4 cells. Xenografted tumors from M3 and M4 cells were 
immunostained for estrogen receptor alpha (ERa) as described in Methods. 
Extensive ERa staining (brown nuclei) is apparent in well-differentiated 
regions of the M3 tumors, but is also seen in regions of the more 
poorly differentiated M4 tumors. H&E, hemotoxylin and eosin. 

Additional file 2: Primer pairs for QPCR. All primers are in 5' to 3' 

orientation. 

Additional file 3: Primer pairs for RT-QPCR. All primers are in 5' to 3' 

orientation. 

Additional file 4: Location of Smad3 binding regions and their TGF- 
(3-induced occupancy by Smad3 in the four cell lines. Location of the 
404 annotated SBRs and their occupancy in M1 to M4 cells was determined 
from the ChlP-chip analysis. +, occupied; - unoccupied. Genome coordinates 
are Hg18. 

Additional file 5: ChlP-QPCR validation of Smad3 target genes with 
different occupancy patterns between the four cell lines. (A) A total 
of 25 Smad3 target genes identified byTAS analysis of ChlP-chip data using 
an FDR of 0.15 were validated by ChlP-QPCR across all four cell lines and 
the pattern of gene occupancy was compared between the two methods. 
The table gives the summary of the results. ChlP-QPCR peaks were scored 
positive if TGF-(3-induced occupancy was significant (P <0.05), and >2-fold 
over untreated. A total of 65/67 of the Smad3 binding regions identified by 
TAS from ChlP-chip in one or more of the cell lines were validated by QPCR. 
However, QPCR was more sensitive and identified an additional 1 1/33 
instances of Smad3 binding in genomic regions that were called negative 
in one or more of the cell lines byTAS (see for example Smad3 occupancy 
of LAM 'B3 promoter in M1 and M2). Thus the experimentally determined 
FDR was 14%, with the majority (85%) of the false calls by ChlP-chip being 
false negatives. IL31 RA is included as an example of a gene that did not 
show Smad3 occupancy in the ChlP-chip analysis. (B) Representative ChlP- 
QPCR validation results are given for genes that show different patterns 
of Smad3 promoter occupancy between the four cell lines by ChlP-chip. 
Results are mean +/-SEM for three replicates. *Smad3 occupancy was 
induced >2-fold by TGF-(3 and was statistically significant (P <0.05; unpaired 
rtest). IL31RA was selected as a gene that did not show Smad3 binding by 
ChlP-chip. aS3, anti-Smad3 antibody; CON, control IgG. 

Additional file 6: Petal plot showing patterns of TGF-(3-regulated 
gene expression in M1 to M4 cells. Global TGF-p-regulated gene 
expression was determined by microarray analysis at the 6 hour time 
point for all four cell lines. Using a fold-change cutoff of 1.5x and a 
significance cutoff of P = 0.001, a total of 563 genes were found to be 
significantly changed in their expression across the four cell lines. The 
majority of these genes were unique to the individual cell lines. 

Additional file 7: Expression of Smad3 occupied genes in vitro. 

Considering only those genes that showed TGF-(3-induced Smad3 occupancy, 
the fraction of TGF-(VSmad3 target genes showing regulated mRNA 
expression at 1 hour or 6 hours was determined from the microarray 
analysis using a P value cutoff of <0.001 for differential expression 
between the TGF-p-treated and untreated condition for a given cell line 
and time point. SBR, Smad3 binding region. 

Additional file 8: Immunostaining of M3 tumors for ANGPTL4 and 
SERPINE1. M3 tumors were immunostained for ANGPTL4 and SERPINE1 
as described in Methods. Immunostaining for both proteins was observed 
predominantly in the tumor parenchyma (T) and not in the stroma (S). 
Scale bar represents 25 urn. 

Additional file 9: Smad3 target genes uniquely regulated by TGF-(3 
in M3 cells, and Smad3 target genes commonly regulated by TGF-J3 
in both M3 and M4 cells. From the gene expression microarray data, 38 
TGF-(VSmad3 target genes were found to be uniquely regulated by TGF- 
(3 in M3 and not in M4 cells in vitro (Tab A). A total of 65 TGF-(VSmad3 
target genes were found to be regulated by TGF-(3 in both M3 and M4 
cells in vitro (Tab B). The tables summarize the direction of regulation of 
these genes by TGF-(3 in in vitro and in vivo. The direction of regulation 
in vivo was determined by comparison of gene expression array data 



from M3 tumors with or without overexpression of a dnT(3RII. The table 
also indicates which of the genes that survived the in vivo filter were 
represented in the GOBO clinical breast cancer array datasets. The 26 
genes that were uniquely upregulated by TGF-(3 in M3 cells in vitro and 
in vivo were validated by RT-QPCR of the tumors (Figure 5). Key: 1 = 
upregulated by TGF-(3; 0 = not regulated by TGF-(3; -1 = downregulated 
by TGF-(3; NA, not applicable. For clarity, the final weighted TSTSS 
signature is also given in Tab C. The table is given as an Excel Spreadsheet 
with three tabs: Tab A, Tab B and Tab C. 

Additional file 10: Prognostic power of the in vitro/in vivo concordant 
and discordant genes sets from the TSTSS when analyzed separately. 

Kaplan-Meier survival curves and multivariate analyses were generated 
within the GOBO breast cancer datasets using (A) only genes whose 
direction of regulation by TGF-(3 was concordant between in vitro and 
in vivo; (B) only genes whose direction of regulation was discordant 
in vitro and in vivo; and (C) the full TSTSS which includes both gene sets. 
Of the 26 genes of the TSTSS, 20 were concordant and 6 were discordant. 
A total of 16 of the 20 concordant genes were found in the GOBO datasets 
(not found: ANXA2, C15orf57, FRMD6 and IRF2BP2). Four of the six discordant 
genes were found in the GOBO datasets (not found: FMNL2 and TMEM88). 

Additional file 11: Performance of a 'generic' TGF-J3 signature in 
breast cancer cohorts using the GOBO meta-analysis tool. Kaplan 
Meier analyses were performed using the online GOBO tool to assess the 
association of the TGF-p-regulated gene sets with distant metastasis-free 
survival (DMFS) in a meta-analysis across the multiple breast cancer cohorts 
of the GOBO dataset (1,379 tumors from eight cohorts). Patient datasets 
representing all tumors were dichotomized to higher than median 
expression (black) or lower than median expression (grey) of the gene 
set. P values were determined by the log-rank test. This figure shows 
Kaplan-Meier plots for survival of all patients in the GOBO datasets, 
using a 'generic' TGF-p signature derived from the set of TGF-(VSmad3 
target genes that were regulated in both M3 and M4 cells, and thus 
not enriched for tumor-suppressor activity. Weighting (positive or 
negative) of individual target genes is based on the directionality of 
TGF-p-regulated expression of this gene set in M3 cells in vitro (A) or in 
M3 tumors in vivo (B) as indicated. The list of genes involved is given in 
Additional file 9. Note the greatly reduced statistical power of this signature 
when compared to the use of the TSTSS tumor-suppressor signature in the 
same cohorts (Figure 6A). Note also that high expression of the generic 
TGF-[3 signature is associated with either good or bad outcome depending 
on whether the in vitro or in vivo directional weighting was used, again 
illustrating the strong influence that the biological context in which the 
signature was generated has on its subsequent performance in the 
clinical datasets. 

Additional file 12: Performance of the TSTSS in independent 
breast cancer cohorts. Kaplan-Meier analyses were performed using 
the R Package 'Survival' to assess the association of the 
TGF-p-regulated gene sets with distant metastasis-free survival (DMFS) 
or overall survival (OS) in three independent breast cancer datasets 
that were not included in the GOBO meta-analysis as they used 
different gene expression array platforms. Patient datasets were 
dichotomized to higher than median expression (black) or lower than 
median expression (grey) of the gene set. P values were determined 
by the log-rank test. Performance of the TSTSS {in vivo weighting) is 
shown for ER+ breast cancer datasets from (A) the Nederlands Kanker 
Instituut (NKI: n = 249), (B) the Cancer Genome Atlas (TCGA) cohort 
(n = 407) and (C) the BT2000/Metabric (n = 1 508) cohorts. (D) GSE6532 
(Loi) is a component dataset from the GOBO cohorts using the 
Affymetrix array platform that was reanalyzed using the same R 
Package method for direct comparison. 

Additional file 13: Correlation between the metaPCNA index or the 
metaEphrin index and the TSTSS in additional ER+ breast cancer 
cohorts. The metaPCNA index (A) is a surrogate for proliferation and the 
metaEphrin index (B) is a surrogate for ephrin pathway activation in normal 
cells. More details on the indices are given in Methods. The GSE6532 (Loi) 
dataset contains 262 ER+ tumors, and the Nederlands Kanker Instituut (NKI) 
dataset contains 249 ER+ tumors. The Spearman correlation coefficient is 
given. 
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Additional file 14: Network analysis on the genes of the TSTSS. 

Ingenuity Pathway Analysis was performed to identify networks formed by 
the 26 genes of the TSTSS, and the top two networks are shown. Network 1 
(score 32) is associated with the following network functions: Cardiovascular 
System Development and Function, Embryonic Development and Function. 
Network 2 (score 29) is associated with Cell Cycle, Digestive System 
Development and Function, and Cancer. Red indicates TGF-(VSmad3 
target genes that were upregulated by TGF-(3 in vivo, and green 
indicates downregulated target genes. White shows non-target genes 
that were used to generate the networks. 

Additional file 15: Relative expression of the TSTSS in different 
human breast cancer subtypes. Analyses were done using the GOBO 
algorithm applied to all breast cancers in the GOBO database. (A) TSTSS 
expression in breast cancers stratified by ER status. (B) TSTSS expression 
in breast cancers stratified by intrinsic molecular subtype. HER2, HER2 
amplified; Lum, luminal. The numbers of tumors in each category is given 
at the top of the figure. ANOVA P values. 
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